Systems of Nonlinear Equations

Problem. Let us find a solution ξ=(ξ1,ξ2,,ξn) of the system of n equations

f1(x)=0,f2(x)=0,fn(x)=0,

and n unknowns x=(x1,x2,,xn). Denoting f=(f1,f2,,fn)T, we can write this system as

f(x)=0.

We shall describe the Newton's method and three quasi-Newton methods:

  1. Broyden's method,

  2. Davidon-Fletcher-Powell method, and

  3. Broyden-Fletcher-Goldfarb-Schano method.

Given starting approximation x(0), all methods generate sequence of points x(n) which, under certain condditions, converges towards the solution ξ.

Remark. Description of the methods and examples are taken from the textbook Numerička matematika, poglavlje 4.4.

3.3 ms

Newton's method

Jacobian or Jacobi matrix of functions f in the point x is the matrix of first partial derivatives

J(f,x)=(f1(x)x1f1(x)x2f1(x)xnf2(x)x1f2(x)x2f2(x)xnfn(x)x1fn(x)x2fn(x)xn).

Given starting approximation x(0), we compute the sequence of points

x(k+1)=x(k)s(k),k=0,1,2,,

where s(k) is the solution of the system of linear equations

J(f,x(k))s=f(x(k)).

Jacobians are computed using the package ForwardDiff.jl. Plotting is done using the package Plots.jl.

18.9 μs
18.6 s
Newton (generic function with 2 methods)
55.2 μs

Example 1

(Dennis and Schnabel, 1996) The solutions of the system

2(x+y)2+(xy)28=05x2+(y3)29=0

are the points T1=(1,1) and T2(1.18,1.59).

12.6 μs
f₁ (generic function with 1 method)
29.9 μs
7.6 μs

Let us plot the functions and the contours in order to approximately locate the zeros:

9.5 μs
232 ms
225 ms
459 μs

We see that the approximate zeros are T1=(1.2,1.5) i T2=(1,1). Moreover, T2 is exactly equal (1,1) (one iteration in the third computation). Furthermore, the method does not always converge (fourth computation).

22.7 μs
J₁ (generic function with 1 method)
19.3 μs
2×2 Array{Float64,2}:
 10.0  14.0
 10.0  -2.0
41.9 μs
1234
33.8 ms

Example 2

(Dennis and Schnabel, 1996) The solutions of the system

x12x222=0ex11+x232=0

are the points T1=(1,1) and T2(0.71,1.22).

8.3 μs
214 ms
582 ms

Example 3

(Dennis and Schnabel, 1996) Solve f(x)=0, where

f(x)=[x1x22x2ex31].

The exact solutions are T1=(0,0,0) and T2=(0,1,0). We shall compute the solutions using several starting points.

28.6 μs
J₃ (generic function with 1 method)
31.1 μs
1234
717 ms

Example 4

(Rosenbrock parabolic valley) Given is the function

f(x)=100(x2x1)2+(1x1)2.

We want to find possible extremal points, thet is, we want to solve the equation

gradf(x)=0.

15.5 μs
f₄ (generic function with 1 method)
27.3 μs
125 ms
272 ms

By observing contours, we conclude that this example is numerically demanding, while it is easy to see analytically that the only zero is T=(1,1).

Here the vector function is given as the gradient of the scalar function f, so the Jacobi matrix is computed using function FowardDiff.hessian() which approximates the matrix of second partial derivatives of the scalar function f.

8.4 μs
24.9 ms

Example 5

Let

f(x)=i=111(x3exp((tix1)2x2)yi)2, where the pairs (ti,yi) are given by the table: i1234567891011ti012345678910yi0.001.01.04.12.21.25.21.12.04.01.001 We want to solve the equation gradf(x)=0. Unlike previous examples, where the condition number is κ(J)=O(10)

in Examples 1, 2 and 3 and

κ(J)=O(1000)

in Example 4, here

κ(J)>O(106)

so the method is inaccurate and does not converge towrds the exact solution x=(4.93,2.62,0.28).

12.5 μs
f₅ (generic function with 1 method)
3.3 ms
4.5 ms
18.3 ms
38.5 μs
306 μs

Broyden's method

Given starting approximation x0 and matrix B0, for each k=0,1,2,, we compute:

Bksk=f(xk)(sustav)xk+1=xk+skyk=f(xk+1)f(xk)Bk+1=Bk+(ykBksk)skTsksk

In this manner, we avoid the computation of the Jacobi matrix in each step. We can take B0=J(x0), but also some other matrix.

30.7 μs
Broyden (generic function with 2 methods)
58.3 μs
58.9 μs
123
6.5 ms
70.8 μs
135 μs
123
13.0 ms
3.9 ms
45.9 μs

Davidon-Fletcher-Powell (DFP) method

DFP is an optimization method which finds extremal points of the function F:RnR, in which case f(x)=gradF(x).

Given initial approximation x0 and matrix H0, for k=0,1,2,, we compute:

sk=Hkf(xk)βk=arg minβF(xk+βsk)sk=βkskxk+1=xk+skyk=f(xk+1)f(xk)Hk+1=Hk+skskTykskHkykykTHkyk(Hkyk).

We can take H0=I. Notice that the iteration step does not require solving a system linear equations. Instead, all updates are performed using O(n2) operations.

The one-dimensional minimzation along the line xk+βsk is preformed by finding zeros of the directional derivative using bisection.

11.4 μs
Bisection (generic function with 2 methods)
51.5 μs
DFP (generic function with 2 methods)
105 μs

Example 6

Let us find extremal points of the function

f(x,y)=(x+2y7)2+(2x+y5)2.

The function has minimum at (1,3).

13.2 μs
f₆ (generic function with 1 method)
49.5 μs
5
1.0 μs
g₆ (generic function with 1 method)
21.2 μs
47.0 ms
293 μs
2.3 ms

Broyden-Fletcher-Goldfarb-Schano (BFGS) method

BFGS is an optimization method for finding extremal points of the function F:RnR, in which case f(x)=gradF(x).

The method is similar to the DFP method, with somewhat better convergence properties.

Let F:RnR, whose minmum we seek, and let f(x)=gradF(x).

Given initial approximation x0 and matrix H0, for k=0,1,2,, we compute:

sk=Hkf(xk)βk=arg minF(xk+βksk)sk=βkskxk+1=xk+skyk=f(xk+1)f(xk)Hk+1=(IskykTyksk)Hk(IykskTyksk)+skskTyksk.

We can take H0=I. Notice that the iteration step does not require solving a system linear equations. Instead, all updates are performed using O(n2) operations.

The one-dimensional minimzation along the line xk+βsk is preformed by finding zeros of the directional derivative using bisection.

22.5 μs
BFGS (generic function with 2 methods)
104 μs
50.8 ms
264 μs
2.1 ms

Julia packages

Previous programs are simple illustrative implementations of the mentioned algorithms. Julia package NLsolve.jl is used for solving systems of linear equations and the package Optim.jl is used for nonlinear optimization.

13.1 μs
9.7 s
f₁! (generic function with 1 method)
45.3 μs
Results of Nonlinear Solver Algorithm
 * Algorithm: Trust-region with dogleg and autoscaling
 * Starting Point: [-1.0, 0.0]
 * Zero: [-1.1834670032425283, 1.5868371427230779]
 * Inf-norm of residuals: 0.000000
 * Iterations: 5
 * Convergence: true
   * |x - x'| < 0.0e+00: false
   * |f(x)| < 1.0e-08: true
 * Function Calls (f): 6
 * Jacobian Calls (df/dx): 6
178 ms
Results of Nonlinear Solver Algorithm
 * Algorithm: Trust-region with dogleg and autoscaling
 * Starting Point: [0.5, 1.1]
 * Zero: [1.0000000000000002, 1.0000000000000002]
 * Inf-norm of residuals: 0.000000
 * Iterations: 5
 * Convergence: true
   * |x - x'| < 0.0e+00: false
   * |f(x)| < 1.0e-08: true
 * Function Calls (f): 6
 * Jacobian Calls (df/dx): 6
40.7 μs
1
Results of Nonlinear Solver Algorithm
 * Algorithm: Trust-region with dogleg and autoscaling
 * Starting Point: [-1.0, 1.0]
 * Zero: [-0.7137474114758742, 1.2208868222037403]
 * Inf-norm of residuals: 0.000000
 * Iterations: 4
 * Convergence: true
   * |x - x'| < 0.0e+00: false
   * |f(x)| < 1.0e-08: true
 * Function Calls (f): 5
 * Jacobian Calls (df/dx): 5
2
Results of Nonlinear Solver Algorithm
 * Algorithm: Trust-region with dogleg and autoscaling
 * Starting Point: [0.8, 1.2]
 * Zero: [0.9999999999940328, 1.0000000000115203]
 * Inf-norm of residuals: 0.000000
 * Iterations: 4
 * Convergence: true
   * |x - x'| < 0.0e+00: false
   * |f(x)| < 1.0e-08: true
 * Function Calls (f): 5
 * Jacobian Calls (df/dx): 5
168 ms
1
Results of Nonlinear Solver Algorithm
 * Algorithm: Trust-region with dogleg and autoscaling
 * Starting Point: [-1.0, 1.0, 0.0]
 * Zero: [0.0, 2.3283064364709337e-10, 0.0]
 * Inf-norm of residuals: 0.000000
 * Iterations: 5
 * Convergence: true
   * |x - x'| < 0.0e+00: false
   * |f(x)| < 1.0e-08: true
 * Function Calls (f): 6
 * Jacobian Calls (df/dx): 6
2
Results of Nonlinear Solver Algorithm
 * Algorithm: Trust-region with dogleg and autoscaling
 * Starting Point: [1.0, 1.0, 1.0]
 * Zero: [0.0, 2.3283064364709337e-10, 1.223235687436471e-12]
 * Inf-norm of residuals: 0.000000
 * Iterations: 5
 * Convergence: true
   * |x - x'| < 0.0e+00: false
   * |f(x)| < 1.0e-08: true
 * Function Calls (f): 6
 * Jacobian Calls (df/dx): 6
3
Results of Nonlinear Solver Algorithm
 * Algorithm: Trust-region with dogleg and autoscaling
 * Starting Point: [-1.0, 1.0, -10.0]
 * Zero: [0.0, 4.9978109751571114e-11, 8.131707812509303e-17]
 * Inf-norm of residuals: 0.000000
 * Iterations: 20
 * Convergence: true
   * |x - x'| < 0.0e+00: false
   * |f(x)| < 1.0e-08: true
 * Function Calls (f): 21
 * Jacobian Calls (df/dx): 8
4
Results of Nonlinear Solver Algorithm
 * Algorithm: Trust-region with dogleg and autoscaling
 * Starting Point: [0.5, -1.5, 0.0]
 * Zero: [0.0, -1.0000000000000007, 0.0]
 * Inf-norm of residuals: 0.000000
 * Iterations: 5
 * Convergence: true
   * |x - x'| < 0.0e+00: false
   * |f(x)| < 1.0e-08: true
 * Function Calls (f): 6
 * Jacobian Calls (df/dx): 6
153 ms
8.4 s
 * Status: success

 * Candidate solution
    Final objective value:     5.375030e-17

 * Found with
    Algorithm:     BFGS

 * Convergence measures
    |x - x'|               = 5.13e-09 ≰ 0.0e+00
    |x - x'|/|x'|          = 5.13e-09 ≰ 0.0e+00
    |f(x) - f(x')|         = 9.67e-17 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 1.80e+00 ≰ 0.0e+00
    |g(x)|                 = 2.10e-11 ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    35
    f(x) calls:    102
    ∇f(x) calls:   102
382 μs
 * Status: success

 * Candidate solution
    Final objective value:     3.572548e-12

 * Found with
    Algorithm:     BFGS

 * Convergence measures
    |x - x'|               = 1.87e+05 ≰ 0.0e+00
    |x - x'|/|x'|          = 8.70e-02 ≰ 0.0e+00
    |f(x) - f(x')|         = 6.03e-16 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 1.69e-04 ≰ 0.0e+00
    |g(x)|                 = 9.69e-09 ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    41
    f(x) calls:    137
    ∇f(x) calls:   137
4.3 ms
200 ns